Distributed multi-query optimization of continuous clustering queries

نویسندگان

  • Sobhan Badiozamany
  • Tore Risch
چکیده

This work addresses the problem of sharing execution plans for queries that continuously cluster streaming data to provide an evolving summary of the data stream. This is challenging since clustering is an expensive task, there might be many clustering queries running simultaneously, each continuous query has a long life time span, and the execution plans often overlap. Clustering is similar to conventional grouped aggregation but cluster formation is more expensive than group formation, which makes incremental maintenance more challenging. The goal of this work is to minimize response time of continuous clustering queries with limited resources through multi-query optimization. To that end, strategies for sharing execution plans between continuous clustering queries are investigated and the architecture of a system is outlined that optimizes the processing of multiple such queries. Since there are many clustering algorithms, the system should be extensible to easily incorporate user defined clustering algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PMJoin: Optimizing Distributed Multi-way Stream Joins by Stream Partitioning

In emerging data stream applications, data sources are typically distributed. Evaluating multi-join queries over streams from different sources may incur large communication cost. As queries run continuously, the precious bandwidths would be aggressively consumed without careful optimization of operator ordering and placement. In this paper, we focus on the optimization of continuous multi-join...

متن کامل

Advanced Data Stream Sharing

Using multi-query optimization for sharing common work among multiple queries requires theidentification of shareable query components. This kind of optimization is particularly effective indistributed data stream management systems (DSMSs) with multiple continuous queries running con-currently over long periods of time. In this paper, we introduce an abstract property tree (APT) an...

متن کامل

Relational Databases Query Optimization using Hybrid Evolutionary Algorithm

Optimizing the database queries is one of hard research problems. Exhaustive search techniques like dynamic programming is suitable for queries with a few relations, but by increasing the number of relations in query, much use of memory and processing is needed, and the use of these methods is not suitable, so we have to use random and evolutionary methods. The use of evolutionary methods, beca...

متن کامل

Scalable multi-query optimization for exploratory queries over federated scientific databases

The diversity and large volumes of data processed in the Natural Sciences today has led to a proliferation of highlyspecialized and autonomous scientific databases with inherent and often intricate relationships. As a user-friendly method for querying this complex, ever-expanding network of sources for correlations, we propose exploratory queries. Exploratory queries are loosely-structured, hen...

متن کامل

Network-aware optimization in distributed data stream management systems

The management of streaming data in distributed environments is gaining importance in many application areas such as sensor networks and e-science. This is mainly due to both, the need for immediate reactions to important events in input streams as well as the requirement to efficiently handle enormous data volumes that are generated, for example, by modern scientific experiments and observatio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014